reference solution
Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis
Chen, Liangliang, Xie, Huiru, Qin, Zhihao, Guo, Yiming, Rohde, Jacqueline, Zhang, Ying
This research full paper presents an enhancement pipeline for large language models (LLMs) in assessing homework for an undergraduate circuit analysis course, aiming to improve LLMs' capacity to provide personalized support to electrical engineering students. Existing evaluations have demonstrated that GPT-4o possesses promising capabilities in assessing student homework in this domain. Building on these findings, we enhance GPT-4o's performance through multi-step prompting, contextual data augmentation, and the incorporation of targeted hints. These strategies effectively address common errors observed in GPT-4o's responses when using simple prompts, leading to a substantial improvement in assessment accuracy. Specifically, the correct response rate for GPT-4o increases from 74.71% to 97.70% after applying the enhanced prompting and augmented data on entry-level circuit analysis topics. This work lays a foundation for the effective integration of LLMs into circuit analysis instruction and, more broadly, into engineering education.
- Europe > Austria > Vienna (0.14)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (3 more...)
- Education > Curriculum > Subject-Specific Education (0.75)
- Education > Educational Setting (0.46)
- Information Technology (0.46)
- Education > Educational Setting > Higher Education (0.40)
Evaluation of Multi- and Single-objective Learning Algorithms for Imbalanced Data
Wojciechowski, Szymon, Woźniak, Michał
Many machine learning tasks aim to find models that work well not for a single, but for a group of criteria, often opposing ones. One such example is imbalanced data classification, where, on the one hand, we want to achieve the best possible classification quality for data from the minority class without degrading the classification quality of the majority class. One solution is to propose an aggregate learning criterion and reduce the multi-objective learning task to a single-criteria optimization problem. Unfortunately, such an approach is characterized by ambiguity of interpretation since the value of the aggregated criterion does not indicate the value of the component criteria. Hence, there are more and more proposals for algorithms based on multi-objective optimization (MOO), which can simultaneously optimize multiple criteria. However, such an approach results in a set of multiple non-dominated solutions (Pareto front). The selection of a single solution from the Pareto front is a challenge itself, and much attention is paid to the issue of how to select it considering user preferences, as well as how to compare solutions returned by different MOO algorithms among themselves. Thus, a significant gap has been identified in the classifier evaluation methodology, i.e., how to reliably compare methods returning single solutions with algorithms returning solutions in the form of Pareto fronts. To fill the aforementioned gap, this article proposes a new, reliable way of evaluating algorithms based on multi-objective algorithms with methods that return single solutions while pointing out solutions from a Pareto front tailored to the user's preferences. This work focuses only on algorithm comparison, not their learning. The algorithms selected for this study are illustrative to help understand the proposed approach.
- Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
AgentEvolver: Towards Efficient Self-Evolving Agent System
Zhai, Yunpeng, Tao, Shuchang, Chen, Cheng, Zou, Anni, Chen, Ziqian, Fu, Qingxu, Mai, Shinji, Yu, Li, Deng, Jiaji, Cao, Zouying, Liu, Zhaoyang, Ding, Bolin, Zhou, Jingren
Autonomous agents powered by large language models (LLMs) have the potential to significantly enhance human productivity by reasoning, using tools, and executing complex tasks in diverse environments. However, current approaches to developing such agents remain costly and inefficient, as they typically require manually constructed task datasets and reinforcement learning (RL) pipelines with extensive random exploration. These limitations lead to prohibitively high data-construction costs, low exploration efficiency, and poor sample utilization. To address these challenges, we present AgentEvolver, a self-evolving agent system that leverages the semantic understanding and reasoning capabilities of LLMs to drive autonomous agent learning. AgentEvolver introduces three synergistic mechanisms: (i) self-questioning, which enables curiosity-driven task generation in novel environments, reducing dependence on handcrafted datasets; (ii) self-navigating, which improves exploration efficiency through experience reuse and hybrid policy guidance; and (iii) self-attributing, which enhances sample efficiency by assigning differentiated rewards to trajectory states and actions based on their contribution. By integrating these mechanisms into a unified framework, AgentEvolver enables scalable, cost-effective, and continual improvement of agent capabilities. Preliminary experiments indicate that AgentEvolver achieves more efficient exploration, better sample utilization, and faster adaptation compared to traditional RL-based baselines.
Smarter Together: Creating Agentic Communities of Practice through Shared Experiential Learning
Tablan, Valentin, Taylor, Scott, Hurtado, Gabriel, Bernhem, Kristoffer, Uhrenholt, Anders, Farei, Gabriele, Moilanen, Karo
The transition from human-centric to agent-centric software development practices is disrupting existing knowledge sharing environments for software developers. Traditional peer-to-peer repositories and developer communities for shared technical knowledge and best practice have witnessed dramatic drops in participation in a short period of time. At the same time, agentic functional equivalents are yet to emerge leaving AI agents, which already generate a significant proportion of all new software code produced, without access to repositories of valuable shared learning. In this paper, we introduce Spark, a novel shared agentic memory architecture which is designed to emulate the collective intelligence and know-how of human developer communities. Spark enables AI coding agents to both contribute to and draw from a persistent and continuously evolving experiential memory. Agents operating in the same general problem space use the Spark shared memory as a repository of new knowledge to achieve collective continual learning. We evaluate Spark as a coach for AI coding agents performing software development tasks. We demonstrate that recommendations made by Spark improve the quality of code generated by generic code generation models at varying sizes and capability tiers. Boosted by Spark, a small open-weights model with 30 billion parameters was able to match the code quality afforded by a much larger state-of-the-art model. Separately, we measure the intrinsic quality of recommendations generated by Spark against a wide range of criteria inspired by software development best practice, and achieve helpfulness levels of up to 98.2% in the top two (out of five) qualitative helpfulness bands.
An MLCommons Scientific Benchmarks Ontology
Hawks, Ben, von Laszewski, Gregor, Sinclair, Matthew D., Colombo, Marco, Venkataraman, Shivaram, Jain, Rutwik, Jiang, Yiwei, Tran, Nhan, Fox, Geoffrey
Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization. This makes novel and transformative applications of machine learning to critical scientific use-cases more fragmented and less clear in pathways to impact. This paper introduces an ontology for scientific benchmarking developed through a unified, community-driven effort that extends the MLCommons ecosystem to cover physics, chemistry, materials science, biology, climate science, and more. Building on prior initiatives such as XAI-BENCH, FastML Science Benchmarks, PDEBench, and the SciMLBench framework, our effort consolidates a large set of disparate benchmarks and frameworks into a single taxonomy of scientific, application, and system-level benchmarks. New benchmarks can be added through an open submission workflow coordinated by the MLCommons Science Working Group and evaluated against a six-category rating rubric that promotes and identifies high-quality benchmarks, enabling stakeholders to select benchmarks that meet their specific needs. The architecture is extensible, supporting future scientific and AI/ML motifs, and we discuss methods for identifying emerging computing patterns for unique scientific workloads. The MLCommons Science Benchmarks Ontology provides a standardized, scalable foundation for reproducible, cross-domain benchmarking in scientific machine learning. A companion webpage for this work has also been developed as the effort evolves: https://mlcommons-science.github.io/benchmark/
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > Illinois > Kane County > Batavia (0.04)
- (5 more...)
- Research Report (0.53)
- Overview > Innovation (0.34)
High-dimensional Bayesian filtering through deep density approximation
In this work, we benchmark two recently developed deep density methods for nonlinear filtering. Starting from the Fokker--Planck equation with Bayes updates, we model the filtering density of a discretely observed SDE. The two filters: the deep splitting filter and the deep BSDE filter, are both based on Feynman--Kac formulas, Euler--Maruyama discretizations and neural networks. The two methods are extended to logarithmic formulations providing sound and robust implementations in increasing state dimension. Comparing to the classical particle filters and ensemble Kalman filters, we benchmark the methods on numerous examples. In the low-dimensional examples the particle filters work well, but when we scale up to a partially observed 100-dimensional Lorenz-96 model the particle-based methods fail and the logarithmic deep density method prevails. In terms of computational efficiency, the deep density methods reduce inference time by roughly two to five orders of magnitude relative to the particle-based filters.
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Berkshire > Reading (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
Comparing EPGP Surrogates and Finite Elements Under Degree-of-Freedom Parity
Amo, Obed, Ghosh, Samit, Lange-Hegermann, Markus, Raiţă, Bogdan, Pokojovy, Michael
We present a new benchmarking study comparing a boundary-constrained Ehrenpreis--Palamodov Gaussian Process (B-EPGP) surrogate with a classical finite element method combined with Crank--Nicolson time stepping (CN-FEM) for solving the two-dimensional wave equation with homogeneous Dirichlet boundary conditions. The B-EPGP construction leverages exponential-polynomial bases derived from the characteristic variety to enforce the PDE and boundary conditions exactly and employs penalized least squares to estimate the coefficients. To ensure fairness across paradigms, we introduce a degrees-of-freedom (DoF) matching protocol. Under matched DoF, B-EPGP consistently attains lower space-time $L^2$-error and maximum-in-time $L^{2}$-error in space than CN-FEM, improving accuracy by roughly two orders of magnitude.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Germany > Berlin (0.04)
- (12 more...)
A data free neural operator enabling fast inference of 2D and 3D Navier Stokes equations
Choi, Junho, Chang, Teng-Yuan, Kim, Namjung, Hong, Youngjoon
Ensemble simulations of high-dimensional flow models (e.g., Navier-Stokes-type PDEs) are computationally prohibitive for real-time appli cations. Neural operators enable fast inference but are limited by costly data req uirements and poor generalization to 3D flows. We present a data-free operator n etwork for the Navier-Stokes equations that eliminates the need for paire d solution data and enables robust, real-time inference for large ensemble for ecasting. The physics-grounded architecture takes initial and boundary conditio ns as well as forcing functions, yielding solutions robust to high variability a nd perturbations. Across 2D benchmarks and 3D test cases, the method surpasses prior n eural operators in accuracy and, for ensembles, achieves greater efficie ncy than conventional numerical solvers. Notably, it delivers accurate solutions of the three-dimensional Navier-Stokes equations--a regime not previously demonstr ated for data-free neural operators. By uniting a numerically grounded archit ecture with the scalability of machine learning, this approach establishes a pra ctical pathway toward data-free, high-fidelity PDE surrogates for end-to-end sci entific simulation and prediction. Solving PDEs efficiently and accurately is one of the central interests for scienc e and engineering. In addition, when dealing with various boundary conditions, initial con ditions, or external forcing terms of PDEs in fields such as fluid mechanics [1-3], materials science [4, 5], weather forecasting [6, 7], and design optimization [8, 9], P DEs are often required to be solved repeatedly. However, conventional numeric al solvers become prohibitively expensive in such settings, particularly for three-dimensional incompressible Navier-Stokes equations (NSEs) [10, 11]. This is because these s olvers rely on spatial-temporal discretization and iterative treatment of nonline ar terms, while performing time marching that demands substantial memory and computation. Moreover, they are not well suited for solving large ensembles of scenarios simu ltaneously, such as those required for uncertainty quantification or design explora tion. The resulting computational time, coupled with the need for extensive sampling in e nsemble or probabilistic simulations, constitutes a critical bottleneck [7, 12].
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)